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14  ABSTRACT 


The  locus  of  this  basic  research  award,  per  the  original  proposal,  was  on  the  development  of  inference  engines  to  be  used  within  an 
overall  information  integration  context,  with  emphasis  on  network-related  environments.  Two  sub-areas  received  the  primary 
attention  in  our  work.  The  first  was  addressed  successfully,  while  the  second  was  pursued,  found  to  be  in  need  of  modification,  and 
the  modification  was  pursued  successfully.  In  addition,  certain  related  issues  regarding  the  acquisition  of  information  from 
networks  and  its  impact  on  inferential  processes  were  successfully  pursued  as  well.  Overall,  the  research  program  was  highly 
successful  in  achieving  its  stated  goals,  as  well  as  in  producing  results  on  additional  related  goals  that  arose  during  the  life  of  the 
award 
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Overview 

The  focus  of  this  basie  research  award,  per  the  original  proposal*  was  on  the  development  of  inference  engines  to 
be  used  within  an  overall  information  integration  context,  with  emphasis  on  network-related  environments.  Two 
sub-areas  received  the  primary  attention  in  our  work  T  he  first  was  addressed  successfully,  while  the  second  was 
pursued,  found  to  be  in  need  of  modification,  and  the  modification  was  pursued  successfully.  In  addition,  certain 
related  issues  regarding  the  acquisition  of  information  from  networks  and  its  impact  on  inferential  processes 
were  successfully  pursued  as  well.  Overall,  the  research  program  was  highly  successful  in  achieving  its  stated 
goals,  as  well  as  in  producing  results  on  additional  related  goals  that  arose  during  the  life  of  the  award. 


Specific  Achievements 

The  first  sub-area  of  research  focus  was  on  the  use  of  noil-parametric  volume-based  inferential  strategics,  in 
conjunction  with  dimension  reduction  techniques.  We  successfully  developed  methods  to 

1 .  extend  the  estimation  of  minimum  volume  sets  from  independent  observations  to  dependent 
observations,  characterizing  the  necessary  mathematics  and  implementing  the  corresponding  algorithms 
in  software.  [Di  and  Kolaczyk  2010] 

2.  augment  the  basic  methodology  so  that  it  could  be  used  as  a  testing/detection  device,  with  corresponding 
control  of  false  detection  rates.  [Scott  and  Kolaczyk  2007,2010] 

3.  use  these  minimum  volume  set  methods  for  anomaly  detection  in  moderate-  to  high-dimensional 
computer  network  traffic  settings.  [Chaabra  et  al  2008] 

The  second  sub-area  was  the  use  of  kernel  fusion  machines  with  numerous  heterogeneous  inputs,  in  conjunction 
with  variable  selection  techniques  and  diagnostic  tools.  This  thrust  was  pursued  primarily  within  the  context  of 
large-scale  biological  databases,  such  as  arise  in  computational  biology,  which  were  felt  to  be  representative  of 
many  sources  of  network-based  data  more  broadly.  There  it  was  found  in  initial  studies  that  (i)  the  standard 
kernel  methods  and  analogous  probability-based  methods  performed  similarly,  and  (ii)  there  were  substantial 
sources  of  uncertainty  in  this  type  of  data,  for  which  work  on  extensions  of  probability-based  methods  would  be 
much  more  likely  to  yield  natural  solutions.  So  the  underlying  technical  machinery  was  shifted  to  integrative 
probability  models,  rather  than  integrative  kernel  methods.  There  we  successfully  developed  methods  to 

1 .  integrate  relational  and  hierarchical  network  data  in  a  probabilistic  framework  that  enforces  coarse-to- 
fine  hierarchical  class  relationships  in  classification,  applied  to  protein  function  prediction  in  the  context 
of  the  Gene  Ontology  network  and  protein  interaction  networks.  [Jiang,  Nariai,  Steffen,  Kasif,  and 
Kolaczyk  2008] 

2.  integrate  multiple  sources  of  heterogeneous  data  types  (i.e.,  both  network  and  non-network,  continuous 
and  categorical),  using  naive-Bayes  and  conditional  naive-Bayes  techniques,  and  implement  for  the  task 
of  protein  function  prediction  [Nariai,  Kolaczyk,  and  Kasif  2007;  Jiang  ct  al  2008;  Jiang  and  Kolaczyk 
2010] 

3.  incorporate  into  the  network-based  process  prediction  problem  (inherent  in  #1  and  2  above)  uncertainty 
information  at  the  level  of  both  network  topology  and  data  on  the  network-index  process,  yielding  an 
ability  to  use  inconsistencies  between  topology  and  process  information  to  correct  for  such  uncertainties. 
[Jiang,  Gold,  and  Kolaczyk  2010] 


Beyond  the  work  in  these  two  sub-areas,  which  constituted  the  main  focus  of  our  efforts,  we  also  pursued  a  third 
line  of  research,  focused  on  the  problem  of  inferring  association  networks  from  temporally  indexed  data. 
Motivated  by  the  application  of  needing  to  infer  so-called  functional  connectivity’  networks  in  neuroscience, 
based  on  scalp-level  voltage  potential  measurements,  we 

1 .  showed  that  statistical  summaries  of  networks  built  by  integrating  measurements  at  multiple 
measurement  sites  provide  potentially  valuable  predictive  information  on  functional  disabilities  in 
epilepsy  patients.  [Kramer,  Kolaezyk,  and  Kirseh  2008] 

2.  developed  an  inference  and  control  procedure  for  creating  such  networks  at  multiple  time  points  over  a 
time  period  in  a  manner  that  maintains  consistent  levels  of  topological  uncertainty  throughout  [Kramer, 
Eden,  Cash,  Kolaezyk  2009] 

Finally,  during  the  period  of  this  award,  the  Pf  wrote  and  published  a  book  in  the  general  topic  area  of  statistical 
analysis  of  network  data  [Kolaezyk  2009]  which  relies  in  part  on  many  of  the  projects  supported  under  both  this 
ONR  award  and  the  Pi's  previous  award  with  ONR.  This  book  is  the  first  of  its  kind,  being  a  comprehensive 
survey  of  statistical  methods  for  network  analysis  written,  contrary  to  most  discipline-specific  treatments,  from 
the  perspective  of  the  statistics  itself.  Organized  according  to  a  statistical  taxonomy,  the  book  covers  both 
descriptive/exploratory  and  inferential  statistical  methods  for  network  data.  At  the  most  recent  Joint  Statistical 
Meetings,  held  in  Washington  DC  in  2009  and  attended  by  over  6000  statisticians,  the  book  was  among  the  top 
ten  sellers  for  Springer,  the  largest  and  one  of  the  most  prominent  publishers  in  statistics. 
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